Modular ML-Agents

ML-Agents Playing Soccer

What Are ML-Agents?

Ever wondered how to make game characters that learn and adapt? ML-Agents (Machine Learning Agents) is a toolkit developed by Unity that allows game developers and researchers to create intelligent agents using machine learning, artificial neural networks, and reinforcement learning. By leveraging Unity's simulation capabilities, developers can train agents using reinforcement learning algorithms to perform complex tasks and behaviors within a game environment. These agents learn by interacting with their environment, receiving rewards or penalties based on their actions, and gradually shaping their behavior toward achieving specific goals.

Why Modular?

Now, why go modular? Think of it like building with LEGO. Modular components allow for flexibility and reusability, letting you mix and match pieces to create something amazing. You can tweak individual parts without disrupting the entire system, making the development process more organized and scalable. It’s like having a toolkit where every piece is designed to fit perfectly, no matter how you rearrange them.

Observations, Actions, and Incentives

When creating an agent, developers define observations, actions, and rewards/penalties. Observations act as the inputs to the neural networks, actions as the outputs, and rewards/penalties guide the agent's behavior by reinforcing or discouraging certain actions. Each agent’s observations might vary depending on what data it needs to understand its environment. For instance, an agent might need to know its position, the direction and distance to a goal, or the positions of other agents. Based on these inputs, the agent takes actions, which can also vary from agent to agent and environment to environment. Then, rewards and penalties are used to guide the agent towards a desired outcome.

Originally, developers had to program each observation, action, and incentive for every agent individually. This was tedious and time-consuming. But with the modular approach, we can create an observation, action, or incentive as a component once, then attach it to any agents that need it.

An example of how we add observational components to an agent.

Imagine you have a set of observations like the agent's own positioning, the direction to the goal, distance to the goal, and other agents' positions. When defining an agent, you can simply add an observation component to its list of observations and move on with your life. These reusable observational components are programmed once and used across multiple agents.

An example of how we add action components to an agent.

The same goes for actions. Maybe an agent’s set of actions consists of moving forward and backward, rotating on the Y-axis, and jumping. You just add these components to the agent’s action list and then figure out how to reward or penalize the agent.

It’s All About The Incentives

Incentives, like observations and actions, are modular too. Imagine creating an agent to learn how to play soccer against other agents. You might want to reward an agent for making a goal, touching the ball, or simply moving towards it. On top of that, each agent might have a different set of incentives. For example, a striker agent is incentivized to push the ball closer to the goal, while a goalie agent is incentivized to stay near their goal and defend it. Reward signaling is crucial to shaping the agent’s behavior; without it, the agent won't learn anything. So choosing the right set of incentives is more of an art than a science. We’ve made many incentive components so far, and it’s as simple as adding these components to the agent’s list of incentives. Our list of pre-programmed components is continuously growing, and you can use them too!

An example of how we add incentive components to an agent.

Looking Ahead

I’m super excited about releasing this modular approach along with the training environment. By providing developers, designers, and scientists with a growing library of observation (inputs), action (outputs), and incentive (optimization functions) components, you can more easily create complex agents. This system of building blocks will enable faster development and more efficient experimentation, pushing the boundaries of what we can achieve with ML-Agents in Unity.

Previous
Previous

A Procedurally Generated Platform